AITopics | chinese translation

Collaborating Authors

chinese translation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Disentangling concept semantics via multilingual averaging in Sparse Autoencoders

O'Reilly, Cliff, Jimenez-Ruiz, Ernesto, Weyde, Tillman

arXiv.org Artificial IntelligenceAug-21-2025

Connecting LLMs with formal knowledge representation and reasoning is a promising approach to address their shortcomings. Embeddings and sparse autoencoders are widely used to represent textual content, but the semantics are entangled with syntactic and language-specific information. We propose a method that isolates concept semantics in Large Langue Models by averaging concept activations derived via Sparse Autoencoders. We create English text representations from OWL ontology classes, translate the English into French and Chinese and then pass these texts as prompts to the Gemma 2B LLM. Using the open source Gemma Scope suite of Sparse Autoencoders, we obtain concept activations for each class and language version. We average the different language activations to derive a conceptual average . We then correlate the conceptual averages with a ground truth mapping between ontology classes. Our results give a strong indication that the conceptual average aligns to the true relationship between classes when compared with a single language by itself. The result hints at a new technique which enables mechanistic interpretation of internal network states with higher accuracy.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2508.14275

Genre:

Research Report > New Finding (0.35)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Towards Multi-dimensional Evaluation of LLM Summarization across Domains and Languages

Min, Hyangsuk, Lee, Yuho, Ban, Minjeong, Deng, Jiaqi, Kim, Nicole Hee-Yeon, Yun, Taewon, Su, Hang, Cai, Jason, Song, Hwanjun

arXiv.org Artificial IntelligenceJun-3-2025

Evaluation frameworks for text summarization have evolved in terms of both domain coverage and metrics. However, existing benchmarks still lack domain-specific assessment criteria, remain predominantly English-centric, and face challenges with human annotation due to the complexity of reasoning. To address these, we introduce MSumBench, which provides a multi-dimensional, multi-domain evaluation of summarization in English and Chinese. It also incorporates specialized assessment criteria for each domain and leverages a multi-agent debate system to enhance annotation quality. By evaluating eight modern summarization models, we discover distinct performance patterns across domains and languages. We further examine large language models as summary evaluators, analyzing the correlation between their evaluation and summarization capabilities, and uncovering systematic bias in their assessment of self-generated summaries. Our benchmark dataset is publicly available at https://github.com/DISL-Lab/MSumBench.

large language model, machine learning, reference document, (23 more...)

arXiv.org Artificial Intelligence

2506.00549

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Government > Regional Government > North America Government > United States Government (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sing it, Narrate it: Quality Musical Lyrics Translation

Ye, Zhuorui, Li, Jinhan, Xu, Rongwu

arXiv.org Artificial IntelligenceOct-29-2024

Translating lyrics for musicals presents unique challenges due to the need to ensure high translation quality while adhering to singability requirements such as length and rhyme. Existing song translation approaches often prioritize these singability constraints at the expense of translation quality, which is crucial for musicals. This paper aims to enhance translation quality while maintaining key singability features. Our method consists of three main components. First, we create a dataset to train reward models for the automatic evaluation of translation quality. Second, to enhance both singability and translation quality, we implement a two-stage training process with filtering techniques. Finally, we introduce an inference-time optimization framework for translating entire songs. Extensive experiments, including both automatic and human evaluations, demonstrate significant improvements over baseline methods and validate the effectiveness of each component in our approach.

lyric, translation, translation quality, (16 more...)

arXiv.org Artificial Intelligence

2410.22066

Country:

Asia > China > Hong Kong (0.04)
North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

What is the Best Way for ChatGPT to Translate Poetry?

Wang, Shanshan, Wong, Derek F., Yao, Jingming, Chao, Lidia S.

arXiv.org Artificial IntelligenceJun-5-2024

Machine translation (MT) has historically faced significant challenges when applied to literary works, particularly in the domain of poetry translation. The advent of Large Language Models such as ChatGPT holds potential for innovation in this field. This study examines ChatGPT's capabilities in English-Chinese poetry translation tasks, utilizing targeted prompts and small sample scenarios to ascertain optimal performance. Despite promising outcomes, our analysis reveals persistent issues in the translations generated by ChatGPT that warrant attention. To address these shortcomings, we propose an Explanation-Assisted Poetry Machine Translation (EAPMT) method, which leverages monolingual poetry explanation as a guiding information for the translation process. Furthermore, we refine existing evaluation criteria to better suit the nuances of modern poetry translation. We engaged a panel of professional poets for assessments, complemented evaluations by using GPT-4. The results from both human and machine evaluations demonstrate that our EAPMT method outperforms traditional translation methods of ChatGPT and the existing online systems. This paper validates the efficacy of our method and contributes a novel perspective to machine-assisted literary translation.

poem, poetry, translation, (14 more...)

arXiv.org Artificial Intelligence

2406.0345

Country:

Asia > Macao (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating the Capability of ChatGPT on Ancient Chinese

Zhou, Siqing, Si, Shijing

arXiv.org Artificial IntelligenceDec-23-2023

ChatGPT's proficiency in handling modern standard languages suggests potential for its use in understanding ancient Chinese. This project explores ChatGPT's capabilities on ancient Chinese via two tasks: translating ancient Chinese to modern Chinese and recognizing ancient Chinese names. A comparison of ChatGPT's output with human translations serves to evaluate its comprehension of ancient Chinese. The findings indicate that: (1.)the proficiency of ancient Chinese by ChatGPT is yet to reach a satisfactory level; (2.) ChatGPT performs the best on ancient-to-modern translation when feeding with three context sentences. To help reproduce our work, we display the python code snippets used in this study.

ancient chinese, chatgpt, translation, (14 more...)

arXiv.org Artificial Intelligence

2312.15304

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards General Error Diagnosis via Behavioral Testing in Machine Translation

Wu, Junjie, Liu, Lemao, Yeung, Dit-Yan

arXiv.org Artificial IntelligenceOct-20-2023

Behavioral testing offers a crucial means of diagnosing linguistic errors and assessing capabilities of NLP models. However, applying behavioral testing to machine translation (MT) systems is challenging as it generally requires human efforts to craft references for evaluating the translation quality of such systems on newly generated test cases. Existing works in behavioral testing of MT systems circumvent this by evaluating translation quality without references, but this restricts diagnosis to specific types of errors, such as incorrect translation of single numeric or currency words. In order to diagnose general errors, this paper proposes a new Bilingual Translation Pair Generation based Behavior Testing (BTPGBT) framework for conducting behavioral testing of MT systems. The core idea of BTPGBT is to employ a novel bilingual translation pair generation (BTPG) approach that automates the construction of high-quality test cases and their pseudoreferences. Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results for general error diagnosis, which further leads to several insightful findings. Our code and data are available at https: //github.com/wujunjie1998/BTPGBT.

mt system, test case, translation, (14 more...)

arXiv.org Artificial Intelligence

2310.13362

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Towards Effective Ancient Chinese Translation: Dataset, Model, and Evaluation

Guo, Geyang, Yang, Jiarong, Lu, Fengyuan, Qin, Jiaxin, Tang, Tianyi, Zhao, Wayne Xin

arXiv.org Artificial IntelligenceJul-31-2023

Interpreting ancient Chinese has been the key to comprehending vast Chinese literature, tradition, and civilization. In this paper, we propose Erya for ancient Chinese translation. From a dataset perspective, we collect, clean, and classify ancient Chinese materials from various sources, forming the most extensive ancient Chinese resource to date. From a model perspective, we devise Erya training method oriented towards ancient Chinese. We design two jointly-working tasks: disyllabic aligned substitution (DAS) and dual masked language model (DMLM). From an evaluation perspective, we build a benchmark to judge ancient Chinese translation quality in different scenarios and evaluate the ancient Chinese translation capacities of various existing models. Our model exhibits remarkable zero-shot performance across five domains, with over +12.0 BLEU against GPT-3.5 models and better human evaluation results than ERNIE Bot. Subsequent fine-tuning further shows the superior transfer capability of Erya model with +6.2 BLEU gain. We release all the above-mentioned resources at https://github.com/RUCAIBox/Erya.

large language model, machine learning, translation, (16 more...)

arXiv.org Artificial Intelligence

2308.0024

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > Mongolia (0.04)
Asia > China > Inner Mongolia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Quantifying syntax similarity with a polynomial representation of dependency trees

Liu, Pengyu, Feng, Tinghao, Liu, Rui

arXiv.org Artificial IntelligenceNov-13-2022

Dependency focuses on the proximity of words in a sentence, and the hierarchical relations between words in the sentence are represented by a tree structure called the dependency tree of the sentence. Recently, an international collaboration project called Universal Dependency (UD) has created a standard annotation scheme for constructing dependency trees from sentences, and hundreds of UD treebanks of various languages have been made publicly available [7]. These datasets form key materials for syntax analysis, providing new opportunities for automated text processing and syntactic typology studies to name a few. Parallel Universal Dependency (PUD) treebanks are a class of UD treebanks consisting of dependency trees of 1,000 sentences and their translations to other languages [33]. The 1,000 sentences are randomly selected from the news domain and Wikipedia and are originally written in English, French, German, Italian or Spanish. At the time of writing, there are 20 PUD treebanks containing the dependency trees of the 1,000 sentences in 20 languages respectively. These UD treebanks have stimulated novel computational methods for syntax analysis and the development of quantitative measures for syntax similarity [19, 31, 32]. However, current methods describing dependency trees mainly focus on partial syntactic information recorded in the structures such as the order of words and the dependency distance [2, 3, 11, 18]. In this work, we introduce a comprehensive representation of dependency trees based on a tree distinguishing polynomial.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2211.07005

Country:

North America > United States > California > Yolo County > Davis (0.14)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
(12 more...)

Genre: Research Report (0.50)

Industry:

Government (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

Word Sense Disambiguation for All Words Without Hard Labor

Zhong, Zhi (National University of Singapore) | Ng, Hwee Tou (National University of Singapore)

AAAI ConferencesJun-23-2009

While the most accurate word sense disambiguation systems are built using supervised learning from sense-tagged data, scaling them up to all words of a language has proved elusive, since preparing a sense-tagged corpus for all words of a language is time-consuming and human labor intensive. In this paper, we propose and implement a completely automatic approach to scale up word sense disambiguation to all words of English. Our approach relies on English-Chinese parallel corpora, English-Chinese bilingual dictionaries, and automatic methods of finding synonyms of Chinese words. No additional human sense annotations or word translations are needed. We conducted a large-scale empirical evaluation on more than 29,000 noun tokens in English texts annotated in OntoNotes 2.0, based on its coarse-grained sense inventory. The evaluation results show that our approach is able to achieve high accuracy, outperforming the first-sense baseline and coming close to a prior reported approach that requires manual human efforts to provide Chinese translations of English senses.

chinese translation, parallel text, synset, (15 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.05)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.05)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.50)

Add feedback